NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Learning in Markov Games with Adaptive Adversaries: Policy Regret, Fundamental Barriers, and Efficient Algorithms

Nguyen-Tang, Thanh; Arora, Raman (December 2024, 38th Conference on Neural Information Processing Systems (NeurIPS 2024))

We study learning in a dynamically evolving environment modeled as a Markov game between a learner and a strategic opponent that can adapt to the learner’s strategies. While most existing works in Markov games focus on external regret as the learning objective, external regret becomes inadequate when the adversaries are adaptive. In this work, we focus on policy regret – a counterfactual notion that aims to compete with the return that would have been attained if the learner had followed the best fixed sequence of policy, in hindsight. We show that if the opponent has unbounded memory or if it is non-stationary, then sample-efficient learning is not possible. For memory-bounded and stationary adversaries, we show that learning is still statistically hard if the set of feasible strategies for the learner is exponentially large. To guarantee learnability, we introduce a new notion of consistent adaptive adversaries, wherein, the adversary responds similarly to similar strategies of the learner. We provide algorithms that achieve √ T policy regret against memorybounded, stationary, and consistent adversaries.
more » « less
Full Text Available
Adversarially Robust Multi-task Representation Learning

Watkins, Austin; Ullah, Enayat; Nguyen-Tang, Thanh; Arora, Raman (December 2024, 38th Conference on Neural Information Processing Systems (NeurIPS 2024))

We study adversarially robust transfer learning, wherein, given labeled data on multiple (source) tasks, the goal is to train a model with small robust error on a previously unseen (target) task. In particular, we consider a multi-task representation learning (MTRL) setting, i.e., we assume that the source and target tasks admit a simple (linear) predictor on top of a shared representation (e.g., the final hidden layer of a deep neural network). In this general setting, we provide rates on the excess adversarial (transfer) risk for Lipschitz losses and smooth nonnegative losses. These rates show that learning a representation using adversarial training on diverse tasks helps protect against inference-time attacks in data-scarce environments. Additionally, we provide novel rates for the single-task setting.
more » « less
Full Text Available
Offline Multitask Representation Learning for Reinforcement Learning

Ishfaq, Haque; Nguyen-Tang, Thanh; Feng, Songtao; Arora, Raman; Wang, Mengdi; Yin, Ming; Precup, Doina (December 2024, 38th Conference on Neural Information Processing Systems (NeurIPS 2024))

We study offline multitask representation learning in reinforcement learning (RL), where a learner is provided with an offline dataset from different tasks that share a common representation and is tasked to learn the shared representation. We theoretically investigate offline multitask low-rank RL, and propose a new algorithm called MORL for offline multitask representation learning. Furthermore, we examine downstream RL in reward-free, offline and online scenarios, where a new task is introduced to the agent that shares the same representation as the upstream offline tasks. Our theoretical results demonstrate the benefits of using the learned representation from the upstream offline task instead of directly learning the representation of the low-rank model.
more » « less
Full Text Available
On The Statistical Complexity of Offline Decision-Making

Nguyen-Tang, Thanh; Arora, Raman (July 2024, Proceedings of the 41st International Conference on Machine Learning, PMLR 235, 2024)

We study the statistical complexity of offline decision-making with function approximation, establishing (near) minimax-optimal rates for stochastic contextual bandits and Markov decision processes. The performance limits are captured by the pseudo-dimension of the (value) function class and a new characterization of the behavior policy that strictly subsumes all the previous notions of data coverage in the offline decision-making literature. In addition, we seek to understand the benefits of using offline data in online decisionmaking and show nearly minimax-optimal rates in a wide range of regimes.
more » « less
Full Text Available

Search for: All records